Skip to content

cranelift(aarch64): lower bare ctz/clz boolean tests via tst/cmp+Cond#13336

Closed
ggreif wants to merge 2 commits into
bytecodealliance:mainfrom
ggreif:gabor/ctz-clz-brif-lowering-aarch64
Closed

cranelift(aarch64): lower bare ctz/clz boolean tests via tst/cmp+Cond#13336
ggreif wants to merge 2 commits into
bytecodealliance:mainfrom
ggreif:gabor/ctz-clz-brif-lowering-aarch64

Conversation

@ggreif
Copy link
Copy Markdown
Contributor

@ggreif ggreif commented May 11, 2026

aarch64 analogue of #13334; egraph counterpart in #13332.

Stacked on #13334. The first two commits in this PR are from #13334 (x64); the aarch64-specific change is the third (HEAD) commit. Mark as ready / merge after #13334 lands.

Same shape as the x64 follow-up: specialise is_nonzero (ctz X) / is_nonzero (clz X) (and their ireduce-wrapped variants) in cranelift/codegen/src/isa/aarch64/inst.isle, so the wasm-natural brif (ireduce.i32 (ctz.i64 X)) shape lowers to a single bit-test instead of rbit; clz; cmp; b.cond.

aarch64-specific instructions used:

  • ctz: tst Xn, #1 (logical AND with immediate, flags only) + Cond.Eq — branches when LSB is clear.
  • clz: cmp Xn, #0 + Cond.Pl — branches when sign bit (N flag) is clear, i.e. X is signed-non-negative.

Test deltas (tests/disas/aarch64-ctz-clz-bool-condition.wat, newly added):

consumer before after
if_ctz_bare_i32 4 insns (rbit + clz + ...) 2 (tst w4, #1; b.eq)
if_ctz_bare_i64 4 insns 2 (tst x4, #1; b.eq)
if_clz_bare_i32 4 insns (clz + ...) 2 (cmp w4, #0; b.pl)

Negative test ((ctz X) == 4) correctly untouched. Same motivation as #13334 — closes the gap for non-Rust wasm frontends like Motoko's moc.

riscv64 and s390x to follow.

ggreif and others added 2 commits May 11, 2026 17:57
Follow-up to bytecodealliance#13332. That PR added egraph rules collapsing
`(eq (ctz X) 0)` / `(ne (ctz X) 0)` / clz analogues to direct
LSB / sign-bit tests — but only when the comparison is mediated by an
explicit `icmp`. The wasm front-end translates `wasm if (ctz X)` to
`brif (ireduce.i32 (ctz.i64 X))` directly (no `icmp`), so the egraph
rules don't fire on the wasm-natural shape.

This commit closes the gap by specialising `is_nonzero` in the x64
backend — the helper that all `brif`/`select`/`trapif` lowerings
funnel through. Four rules: `ctz`/`clz` × bare/`ireduce`-wrapped.

The `ireduce` variant catches the wasm front-end's `i32.wrap_i64`
over a 64-bit `ctz`/`clz` — a no-op on values in [0, bitwidth].

Test deltas (tests/disas/ctz-clz-bool-condition.wat):

  if_ctz_bare_i32:   5 insns -> 2 (testl $1, %edx; je)
  if_ctz_bare_i64:   5 insns -> 2 (testq $1, %rdx; je)
  if_clz_bare_i32:   7 insns -> 2 (testl %edx, %edx; jns)

The icmp-mediated cases (collapsed by bytecodealliance#13332's egraph rules) are
unchanged. The numeric-comparison negative test stays untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aarch64 analogue of the x64 follow-up. Specialises `is_nonzero (ctz X)`
and `is_nonzero (clz X)` (plus their `ireduce`-wrapped variants) so the
wasm-natural `brif (ireduce.i32 (ctz.i64 X))` shape lowers to a single
bit-test instead of `rbit; clz; cmp; b.cond`.

  ctz: `tst Xn, #1` + `Cond.Eq` — branches when LSB is clear.
  clz: `cmp Xn, #0` + `Cond.Pl` — branches when sign bit is clear.

Test deltas (tests/disas/aarch64-ctz-clz-bool-condition.wat):

  if_ctz_bare_i32:   `tst w4, #1; b.eq`
  if_ctz_bare_i64:   `tst x4, #1; b.eq`
  if_clz_bare_i32:   `cmp w4, #0; b.pl`

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ggreif ggreif changed the title cranelift(aarch64): lower bare ctz/clz boolean tests via tst/cmp+Cond cranelift(aarch64): lower bare ctz/clz boolean tests via tst/cmp+Cond May 11, 2026
@github-actions github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels May 11, 2026
@cfallin
Copy link
Copy Markdown
Member

cfallin commented May 12, 2026

Ideally we would do this in the mid-end, not in every back-end individually (and then revert #13334 as well).

In principle this would work by matching on the brif, not the clz -- your mid-end PR only considered simplifications of the clz itself, which is why you didn't see this option I think. In other words, it's not valid to simplify (clz x) to a bare x & 1, but when used as a condition to a brif, where all we are testing is whether the result is nonzero, it is. So the whole (brif (clz x) a b) should simplify to (brif (band x (iconst 1)) a b) more or less (some syntactical details elided).

I can't seem to get the simplify_skeleton entry point to work for brif right now (quick experiment, I haven't dug into it much); but that's the shape of what we'd want. Does that make sense?

@ggreif
Copy link
Copy Markdown
Contributor Author

ggreif commented May 12, 2026

Closing in favor of #13343 — the mid-end simplify_skeleton-on-brif extension covers the same cases and produces strictly better aarch64 code: tbz w0, #0 (single-instruction test-and-branch) vs this PR's tst+cmp+b.cc. See the comparison table in #13343.

@ggreif ggreif closed this May 12, 2026
ggreif added a commit to ggreif/wasmtime that referenced this pull request May 12, 2026
…keleton`

The mid-end rules added in bytecodealliance#13332 hinge on an `icmp eq/ne (ctz/clz X) 0`
shape — i.e. the wasm 3-op pattern `i32.ctz; i32.eqz; br_if`. Frontends
that emit the 2-op form `i32.ctz; br_if` (e.g. Motoko's `moc` after its
`and 1; eqz; br_if` → `ctz; br_if` byte-size peephole) feed `(brif (ctz X))`
into cranelift with no `icmp` for the existing rules to match.

This commit extends `simplify_skeleton` to rewrite the *condition operand*
of an existing `brif` in place, without touching its opcode or successor
blocks (CFG-preserving by construction). A new `SkeletonInstSimplification`
variant `ReplaceBranchCond(Value)` carries the new condition; the egraph
driver applies it by writing through `inst_args_mut`. Two ISLE rules in
`opts/icmp.isle` rewrite `(brif (ctz X) bt be)` and `(brif (clz X) bt be)`
to brifs over the equivalent bit-extract form:

  brif (ctz X) bt be   →   brif (eq (band X 1) 0) bt be
  brif (clz X) bt be   →   brif (sge X 0)         bt be

End-to-end lowering on the resulting brif then composes with existing
backend `icmp+brif` fusion to produce:

  x86_64  brif (ctz X):   `testl $1, %edi; je`
  x86_64  brif (clz X):   `testl %edi, %edi; jge`
  aarch64 brif (ctz X):   `tbz w0, #0` — single-instruction test-and-branch

This subsumes the backend-side x64 rules added in bytecodealliance#13334 and the aarch64
rules in bytecodealliance#13336 (and yields tighter aarch64 code than bytecodealliance#13336 did).

The driver still rejects non-`brif` branches and rejects non-`ReplaceBranchCond`
simplification variants on `brif` (a `Replace inst` of a brif would risk
changing successor block IDs and is left to a future, broader extension).

Filetest `egraph/brif-cnt-cond.clif` covers ctz/clz over i32/i64 in the
2-op form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pull Bot pushed a commit to eduardomourar/wasmtime that referenced this pull request May 13, 2026
…keleton` (bytecodealliance#13343)

* cranelift: fold `ctz`/`clz` directly into `brif` cond via `simplify_skeleton`

The mid-end rules added in bytecodealliance#13332 hinge on an `icmp eq/ne (ctz/clz X) 0`
shape — i.e. the wasm 3-op pattern `i32.ctz; i32.eqz; br_if`. Frontends
that emit the 2-op form `i32.ctz; br_if` (e.g. Motoko's `moc` after its
`and 1; eqz; br_if` → `ctz; br_if` byte-size peephole) feed `(brif (ctz X))`
into cranelift with no `icmp` for the existing rules to match.

This commit extends `simplify_skeleton` to rewrite the *condition operand*
of an existing `brif` in place, without touching its opcode or successor
blocks (CFG-preserving by construction). A new `SkeletonInstSimplification`
variant `ReplaceBranchCond(Value)` carries the new condition; the egraph
driver applies it by writing through `inst_args_mut`. Two ISLE rules in
`opts/icmp.isle` rewrite `(brif (ctz X) bt be)` and `(brif (clz X) bt be)`
to brifs over the equivalent bit-extract form:

  brif (ctz X) bt be   →   brif (eq (band X 1) 0) bt be
  brif (clz X) bt be   →   brif (sge X 0)         bt be

End-to-end lowering on the resulting brif then composes with existing
backend `icmp+brif` fusion to produce:

  x86_64  brif (ctz X):   `testl $1, %edi; je`
  x86_64  brif (clz X):   `testl %edi, %edi; jge`
  aarch64 brif (ctz X):   `tbz w0, #0` — single-instruction test-and-branch

This subsumes the backend-side x64 rules added in bytecodealliance#13334 and the aarch64
rules in bytecodealliance#13336 (and yields tighter aarch64 code than bytecodealliance#13336 did).

The driver still rejects non-`brif` branches and rejects non-`ReplaceBranchCond`
simplification variants on `brif` (a `Replace inst` of a brif would risk
changing successor block IDs and is left to a future, broader extension).

Filetest `egraph/brif-cnt-cond.clif` covers ctz/clz over i32/i64 in the
2-op form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* rustfmt: collapse `is_branch` && opcode-guard onto one line

* tests/disas: re-bless ctz/clz-bool-condition for new mid-end fold

The new `simplify_skeleton`-on-`brif` rule rewrites the 2-op
`if (ctz/clz x)` cases that bytecodealliance#13332's commentary noted were the
non-icmp-mediated holdouts. Bare-form lowering shrinks from
~9 instructions (bsf/bsr + cmov + test + jne + …) to
`testl $1, %edx; je` (ctz) and `testl %edx, %edx; jge` (clz).

Offsets on the subsequent non-bare functions shift down to match.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants